Search CORE

107 research outputs found

Piano Genie

Author: Dieleman Sander
Donahue Chris
Simon Ian
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 22/03/2019
Field of study

We present Piano Genie, an intelligent controller which allows non-musicians to improvise on the piano. With Piano Genie, a user performs on a simple interface with eight buttons, and their performance is decoded into the space of plausible piano music in real time. To learn a suitable mapping procedure for this problem, we train recurrent neural network autoencoders with discrete bottlenecks: an encoder learns an appropriate sequence of buttons corresponding to a piano piece, and a decoder learns to map this sequence back to the original piece. During performance, we substitute a user's input for the encoder output, and play the decoder's prediction each time the user presses a button. To improve the intuitiveness of Piano Genie's performance behavior, we impose musically meaningful constraints over the encoder's outputs.Comment: Published as a conference paper at ACM IUI 201

arXiv.org e-Print Archive

Crossref

Expediting TTS Synthesis with Adversarial Vocoding

Author: Donahue Chris
Dubnov Shlomo
McAuley Julian
Neekhara Paarth
Puckette Miller
Publication venue
Publication date: 16/04/2019
Field of study

Recent approaches in text-to-speech (TTS) synthesis employ neural network strategies to vocode perceptually-informed spectrogram representations directly into listenable waveforms. Such vocoding procedures create a computational bottleneck in modern TTS pipelines. We propose an alternative approach which utilizes generative adversarial networks (GANs) to learn mappings from perceptually-informed spectrograms to simple magnitude spectrograms which can be heuristically vocoded. Through a user study, we show that our approach significantly outperforms na\"ive vocoding strategies while being hundreds of times faster than neural network vocoders used in state-of-the-art TTS systems. We also show that our method can be used to achieve state-of-the-art results in unsupervised synthesis of individual words of speech.Comment: Published as a conference paper at INTERSPEECH 201

arXiv.org e-Print Archive

Crossref

eScholarship - University of California